160

Applications in Computer Vision

Given a conventional FC layer, we denote wiRmi and aiRCi as its weights and

features in the i-th layer, where mi = Ci × Ci1. Ci represents the number of output

channels of i-th layer. Then we have the following.

ai = ai1wi,

(6.40)

wheredenotes full-precision multiplication. As mentioned above, the BNN model aims

to binarize wi and ai into PRB(wi) and PRB(ai). For simplification, in this chapter we

denote PRB(wi) and PRB(ai) as bwi Bmi and bai BCi in this paper, respectively.

Then, we use the efficient XNOR and Bit-count operations to replace full-precision opera-

tions. Following [199], the forward process of the BNN is

ai = bai1 bwi,

(6.41)

whererepresents efficient XNOR and Bit-count operations. Based on XNOR-Net [199],

we introduce a learnable channel-wise scale factor to modulate the amplitude of real-valued

convolution. Aligned with the Batch Normalization (BN) and activation layers, the process

is formulated as

bai = sign(Φ(αibai1 bwi)),

(6.42)

where we divide the data flow in POEM into units for detailed discussions. In POEM, the

original output feature ai is first scaled by a channel-wise scale factor (vector) αiRCi

to modulate the amplitude of its full-precision counterparts. It then enters Φ(·), which

represents a composite function built by stacking several layers, e.g., the BN layer, the non-

linear activation layer, and the max-pooling layer. Then the output is binarized to obtain

the binary activations bai BCi, through the sign function. sign(·) denotes the sign function

that returns +1 if the input is greater than zeros and1 otherwise. Then, 1-bit activation

bai can be used for efficient XNOR and Bit-count of the (i+1)-th layer.

6.3.3

Supervision for POEM

To constrain Bi-FC to have binarized weights with amplitudes similar to their real-valued

counterparts, we introduce a new loss function in our supervision for POEM. We consider

that unbinarized weights should be reconstructed based on binarized weights, as revealed

in Eq. 6.38. We define the reconstruction loss according to Eq. 6.38 as

LR = 1

2wiαibwi2

2,

(6.43)

where LR is the reconstruction loss. Taking into account the impact of αi on the layer

output, we define the learning objective of our POEM as

arg min

{wi,αi,pi},iN

LS(wi, αi, pi) + λLR(wi, αi),

(6.44)

where pi denotes the other parameters of real-valued layers in the network, e.g., BN layer,

activation layer, and unbinarized fully-connected layer. N denotes the number of layers in

the network. LS is the cross entropy.

And λ is a hyperparameter. Unlike binarization methods (such as XNOR-Net [199] and

Bi-Real Net [159]) where only the reconstruction loss is considered in the weight calculation.

By fine-tuning the value of λ, our proposed POEM can achieve much better performance

than XNOR-Net, which shows the effectiveness of combined loss against only softmax loss.

Our discrete optimization method comprehensively calculates the Bi-FC layers considering

the reconstruction loss and the softmax loss in a unified framework.